Performance characterization of containerization for HPC workloads on InfiniBand clusters: an empirical study

نویسندگان

چکیده

Abstract Containerization technology offers an appealing alternative for encapsulating and operating applications (and all their dependencies) without being constrained by the performance penalties of using Virtual Machines and, as a result, has got interest High-Performance Computing (HPC) community to obtain fast, customized, portable, flexible, reproducible deployments workloads. Previous work on this area demonstrated that containerized HPC can exploit InfiniBand networks, but ignored potential multi-container which partition processes belong each application into multiple containers in host. Partitioning be useful when virtual machines constraining them single NUMA (Non-Uniform Memory Access) domain. This paper conducts systematical study with different network fabrics protocols, focusing especially Infiniband networks. We analyze impact container granularity its processor memory affinity improve applications’ performance. Our results show default Singularity achieve near bare-metal does not support fine-grain deployments. Docker Singularity-instance have similar behavior terms deployment schemes affinity. differs several depends well communication patterns message size. Moreover, are also more impacted computation allocation, because that, they better.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?

Dense Multi-GPU systems have recently gained a lot of attention in the HPC arena. Traditionally, MPI runtimes have been primarily designed for clusters with a large number of nodes. However, with the advent of MPI+CUDA applications and CUDA-Aware MPI runtimes like MVAPICH2 and OpenMPI, it has become important to address efficient communication schemes for such dense Multi-GPU nodes. This couple...

متن کامل

A Comprehensive Performance Evaluation of OpenSHMEM Libraries on InfiniBand Clusters

OpenSHMEM is an open standard that brings together several long-standing vendor-specific SHMEM implementations and allows applications to use SHMEM in a platform-independent fashion. Several implementations of OpenSHMEM have become available on clusters interconnected by InfiniBand networks, which has gradually become the de facto high performance network interconnect standard. In this paper, w...

متن کامل

Empirical Performance Models for Java Workloads

Java is widely deployed on a variety of processor architectures. Consequently, an understanding of microarchitecture level Java performance is critical to optimize current systems and to aid design and development of future processor architectures for Java. Although this is facilitated by a rich set of processor performance counters featured on several contemporary processors, complex processor...

متن کامل

The Case For Colocation of HPC Workloads

The current state of practice in supercomputer resource allocation places jobs from different users on disjoint nodes both in terms of time and space. While this approach largely guarantees that jobs from different users do not degrade one another’s performance, it does so at high cost to system throughput and energy efficiency. This focused study presents job striping, a technique that signifi...

متن کامل

Network Performance in Distributed HPC Clusters

Linux-based clusters have become prevalent as a foundation for High Performance Computing (HPC) systems. As these clusters become more affordable and available, and with the emergence of high speed networks, it is becoming more feasible to create HPC grids consisting of multiple clusters. One of the attractions of such grids is the potential to scale applications across the various clusters. Th...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Cluster Computing

سال: 2021

ISSN: ['1386-7857', '1573-7543']

DOI: https://doi.org/10.1007/s10586-021-03460-8